Search CORE

Public Library of Science (PLOS)

Adjacent Nucleotide Dependence in ncRNA and Order-1 SCFG for ncRNA Identification

Author: AE Walter
C Liu
DN Frank
EP Nawrocki
J Kim
P Larsson
R Backofen
RJ Klein
S Griffiths-Jones
S Griffiths-Jones
S Washietl
Siu-Ming Yiu
SR Eddy
SR Eddy
T Lowe
T Xia
Tak-Wah Lam
Thomas K. F. Wong
Thomas Mailund
VT Nguyen
Wing-Kin Sung
Z Yang
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

Background: Non-coding RNAs (ncRNAs) are known to be involved in many critical biological processes, and identification of ncRNAs is an important task in biological research. A popular software, Infernal, is the most successful prediction tool and exhibits high sensitivity. The application of Infernal has been mainly focused on small suspected regions. We tried to apply Infernal on a chromosome level; the results have high sensitivity, yet contain many false positives. Further enhancing Infernal for chromosome level or genome wide study is desirable. Methodology: Based on the conjecture that adjacent nucleotide dependence affects the stability of the secondary structure of an ncRNA, we first conduct a systematic study on human ncRNAs and find that adjacent nucleotide dependence in human ncRNA should be useful for identifying ncRNAs. We then incorporate this dependence in the SCFG model and develop a new order-1 SCFG model for identifying ncRNAs. Conclusions: With respect to our experiments on human chromosomes, the proposed new model can eliminate more than 50 % false positives reported by Infernal while maintaining the same sensitivity. The executable and the source code of programs are freely available a

CiteSeerX

HKU Scholars Hub

ScholarBank@NUS

Genome re-annotation: a wiki solution?

Author: AL Delcher
AV Lukashin
JC Venter
JD Peterson
O White
RD Fleischmann
SF Altschul
SR Eddy
Steven L Salzberg
The International Human Genome Sequencing Consortium
Publication venue: BioMed Central
Publication date: 01/02/2007
Field of study

The annotation of most genomes becomes outdated over time, owing in part to our ever-improving knowledge of genomes and in part to improvements in bioinformatics software. Unfortunately, annotation is rarely if ever updated and resources to support routine reannotation are scarce. Wiki software, which would allow many scientists to edit each genome's annotation, offers one possible solution

Digital Repository at the University of Maryland

Directed acyclic graph kernels for structural RNA analysis

Author: B Knudsen
B Schölkopf
CB Do
D Haussler
D Sankoff
DB Searls
DM Tax
E Rivas
EK Freyhult
H Kiryu
H Saigo
I Holmes
IL Hofacker
IL Hofacker
J Hertel
J Hertel
JD Thompson
JS McCaskill
JS Pedersen
JW Brown
K Sato
Kengo Sato
Kiyoshi Asai
MA Rosenblad
P Pacheco
RD Dowell
RE Fan
RJ Klein
S Washietl
S Washietl
S Will
SR Eddy
SR Eddy
SR Eddy
T Babak
T Kin
Toutai Mituyama
W Deng
Y Sakakibara
Y Sakakibara
Y Sakakibara
Yasubumi Sakakibara
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Recent discoveries of a large variety of important roles for non-coding RNAs (ncRNAs) have been reported by numerous researchers. In order to analyze ncRNAs by kernel methods including support vector machines, we propose stem kernels as an extension of string kernels for measuring the similarities between two RNA sequences from the viewpoint of secondary structures. However, applying stem kernels directly to large data sets of ncRNAs is impractical due to their computational complexity. Results We have developed a new technique based on directed acyclic graphs (DAGs) derived from base-pairing probability matrices of RNA sequences that significantly increases the computation speed of stem kernels. Furthermore, we propose profile-profile stem kernels for multiple alignments of RNA sequences which utilize base-pairing probability matrices for multiple alignments instead of those for individual sequences. Our kernels outperformed the existing methods with respect to the detection of known ncRNAs and kernel hierarchical clustering. Conclusion Stem kernels can be utilized as a reliable similarity measure of structural RNAs, and can be used in various kernel-based applications.</p

arXiv.org e-Print Archive

Developing and applying heterogeneous phylogenetic models with XRate

Author: A Heger
A Siepel
A Varadarajan
AJ Drummond
B Knudsen
B Knudsen
Christos A. Ouzounis
D Ayres
DB Searls
E Birney
G Lunter
GSC Slater
Ian Holmes
IM Meyer
J Felsenstein
J Goecks
J Watts
JS Pedersen
L Stein
M Garber
M Hasegawa
M Kimura
M Zuker
ME Skinner
N Saitou
O Penn
Oscar Westesson
PS Klosterman
RK Bradley
SR Eddy
TH Jukes
WJ Kent
Z Yang
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 16/02/2012
Field of study

Modeling sequence evolution on phylogenetic trees is a useful technique in computational biology. Especially powerful are models which take account of the heterogeneous nature of sequence evolution according to the "grammar" of the encoded gene features. However, beyond a modest level of model complexity, manual coding of models becomes prohibitively labor-intensive. We demonstrate, via a set of case studies, the new built-in model-prototyping capabilities of XRate (macros and Scheme extensions). These features allow rapid implementation of phylogenetic models which would have previously been far more labor-intensive. XRate's new capabilities for lineage-specific models, ancestral sequence reconstruction, and improved annotation output are also discussed. XRate's flexible model-specification capabilities and computational efficiency make it well-suited to developing and prototyping phylogenetic grammar models. XRate is available as part of the DART software package: http://biowiki.org/DART .Comment: 34 pages, 3 figures, glossary of XRate model terminolog

Public Library of Science (PLOS)

FigShare

smyRNA: A Novel Ab Initio ncRNA Gene Finder

Author: A Coventry
A Fontaine
C Dieterich
Cagri Aksay
D di Bernardo
DP Bartel
E Bonnet
E Rivas
E Rivas
Emre Karakoc
G Storz
IL Hofacker
IL Hofacker
IM Meyer
IM Meyer
Iman Hajirasouliha
J Thompson
JS Pedersen
M Margulies
Peter J. Unrau
Raheleh Salari
RJ Carter
S Griffiths-Jones
S Washietl
S. Cenk Sahinalp
SR Eddy
SR Eddy
Stefan Maas
Z Yao
Publication venue: Public Library of Science
Publication date: 01/01/2008
Field of study

Background: Non-coding RNAs (ncRNAs) have important functional roles in the cell: for example, they regulate gene expression by means of establishing stable joint structures with target mRNAs via complementary sequence motifs. Sequence motifs are also important determinants of the structure of ncRNAs. Although ncRNAs are abundant, discovering novel ncRNAs on genome sequences has proven to be a hard task; in particular past attempts for ab initio ncRNA search mostly failed with the exception of tools that can identify micro RNAs. Methodology/Principal Findings: We present a very general ab initio ncRNA gene finder that exploits differential distributions of sequence motifs between ncRNAs and background genome sequences. Conclusions/Significance: Our method, once trained on a set of ncRNAs from a given species, can be applied to a genome sequences of other organisms to find not only ncRNAs homologous to those in the training set but also others that potentially belong to novel (and perhaps unknown) ncRNA families. Availability

CiteSeerX

Simon Fraser University Institutional Repository

Sequence-Based Classification of Select Agents: A Brighter Line

Author: Baric R
Breeze RG
Buller RM
Eddy SR
Falkow S
Leduc JW
Levinson RE
Mulligan J
O'Brien AD
Ochoa-Corona F
Richardson JS
Riley M
Slezak T
Publication venue: National Academies Press
Publication date
Field of study

DukeSpace

Animal Ca2+ release-activated Ca2+ (CRAC) channels appear to be homologous to and derived from the ubiquitous cation diffusion facilitators

Author: A Krogh
B Montanini
B Wang
C Peinelt
CJ Haney
CW MacDiarmid
DL Jack
Dorjee G Tamang
G Dunn
IT Paulsen
J Devereux
J Soboloff
JC Mercer
JD Thompson
Kenny M Gomolplitinant
KT Cheng
M Lu
M Vig
M Vig
M Yamashita
Madeleine G Matias
MD Cahalan
MH Saier Jr
MH Saier Jr
Milton H Saier
MO Dayhoff
MR Yen
MR Yen
MR Yen
NM Mansour
O Mignen
PG Hogan
R Durbin
R Hughey
RA Cragg
RF Doolittle
RJ Cousins
RT Williams
S Feske
S Feske
SF Altschul
SR Eddy
SR Eddy
SR Eddy
W Li
X Cai
X Cai
X Zhou
Y Baba
Y Chao
Y Wei
Y Wei
Y Zhai
Y Zhai
Y Zhai
Y Zhai
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Antigen stimulation of immune cells triggers Ca2+ entry through Ca2+ release-activated Ca2+ (CRAC) channels, promoting an immune response to pathogens. Defects in a CRAC (Orai) channel in humans gives rise to the hereditary Severe Combined Immune Deficiency (SCID) syndrome. We here report results that define the evolutionary relationship of the CRAC channel proteins of animals, and the ubiquitous Cation Diffusion Facilitator (CDF) carrier proteins. Findings CDF antiporters derived from a primordial 2 transmembrane spanner (TMS) hairpin structure by intragenic triplication to yield 6 TMS proteins. Four programs (IC/GAP, GGSEARCH, HMMER and SAM) were evaluated for identifying sequence similarity and establishing homology using statistical means. Overall, the order of sensitivity (similarity detection) was IC/GAP = GGSEARCH > HMMER > SAM, but the use of all four programs was superior to the use of any two or three of them. Members of the CDF family appeared to be homologous to members of the 4 TMS Orai channel proteins. Conclusions CRAC channels derived from CDF carriers by loss of the first two TMSs of the latter. Based on statistical analyses with multiple programs, TMSs 3-6 in CDF carriers are homologous to TMSs 1-4 in CRAC channels, and the former was the precursor of the latter. This is an unusual example of how a functionally and structurally more complex protein may have predated a simpler one.</p

eScholarship - University of California

How accurately is ncRNA aligned within whole-genome multiple alignments?

Author: A Prakash
A Prakash
A Siepel
Adrienne X Wang
DA Pollard
DA Pollard
E Rivas
E Torarinsson
EH Margulies
G Bourque
J Pei
JD Thompson
JD Thompson
JD Thompson
L Wang
M Blanchette
M Brudno
M Cline
M Errami
Martin Tompa
MS Rosenberg
S Batzoglou
S Griffiths-Jones
S Griffiths-Jones
S Karlin
S Kumar
S Schwartz
S Washietl
SR Eddy
SR Eddy
T Lassmann
W Miller
Walter L Ruzzo
WJ Kent
WJ Kent
WJ Kent
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background Multiple alignment of homologous DNA sequences is of great interest to biologists since it provides a window into evolutionary processes. At present, the accuracy of whole-genome multiple alignments, particularly in noncoding regions, has not been thoroughly evaluated. Results We evaluate the alignment accuracy of certain noncoding regions using noncoding RNA alignments from Rfam as a reference. We inspect the MULTIZ 17-vertebrate alignment from the UCSC Genome Browser for all the human sequences in the Rfam seed alignments. In particular, we find 638 instances of chimeric and partial alignments to human noncoding RNA elements, of which at least 225 can be improved by straightforward means. As a byproduct of our procedure, we predict many novel instances of known ncRNA families that are suggested by the alignment. Conclusion MULTIZ does a fairly accurate job of aligning these genomes in these difficult regions. However, our experiments indicate that better alignments exist in some regions.</p

EST-PAC a web package for EST annotation and protein sequence prediction

Author: A Bateman
A Hotz-Wagenblatt
C Iseli
C Lottaz
C Mao
Christophe Lefèvre
David Powell
E Dias Neto
G Wistow
J Parkinson
JD Wasmuth
LD Hillier
LK Matukumalli
MD Adams
P Ayoubi
S McGinnis
SR Eddy
Yvan Strahm
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

With the decreasing cost of DNA sequencing technology and the vast diversity of biological resources, researchers increasingly face the basic challenge of annotating a larger number of expressed sequences tags (EST) from a variety of species. This typically consists of a series of repetitive tasks, which should be automated and easy to use. The results of these annotation tasks need to be stored and organized in a consistent way. All these operations should be self-installing, platform independent, easy to customize and amenable to using distributed bioinformatics resources available on the Internet. In order to address these issues, we present EST-PAC a web oriented multi-platform software package for expressed sequences tag (EST) annotation. EST-PAC provides a solution for the administration of EST and protein sequence annotations accessible through a web interface. Three aspects of EST annotation are automated: 1) searching local or remote biological databases for sequence similarities using Blast services, 2) predicting protein coding sequence from EST data and, 3) annotating predicted protein sequences with functional domain predictions. In practice, EST-PAC integrates the BLASTALL suite, EST-Scan2 and HMMER in a relational database system accessible through a simple web interface. EST-PAC also takes advantage of the relational database to allow consistent storage, powerful queries of results and, management of the annotation process. The system allows users to customize annotation strategies and provides an open-source data-management environment for research and education in bioinformatics